There has been a lot of recent work on Bayesian methods for reinforcementlearning exhibiting near-optimal online performance. The main obstacle facingsuch methods is that in most problems of interest, the optimal solutioninvolves planning in an infinitely large tree. However, it is possible toobtain stochastic lower and upper bounds on the value of each tree node. Thisenables us to use stochastic branch and bound algorithms to search the treeefficiently. This paper proposes two such algorithms and examines theircomplexity in this setting.
展开▼